Quickstart - Customer Churn Full Suite Model Documentation

This interactive notebook will guide you through documenting a model using the ValidMind Developer framework. We will use sample datasets provided by the library and train a simple classification model.

For this simple demonstration, we will use the following bank customer churn dataset from Kaggle: https://www.kaggle.com/code/kmalit/bank-customer-churn-prediction/data.

We will train a sample model and demonstrate the following documentation functionalities:

Before Starting (Important)

Click File > Save a copy in Drive > to make your own copy in Google Drive so that you can modify the notebook.

Alternatively, you can download the notebook source and work with it in your own developer environment.

Install ValidMind Developer Framework

!pip install validmind
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: validmind in /usr/local/lib/python3.10/dist-packages (1.11.6)
Requirement already satisfied: arch<6.0.0,>=5.4.0 in /usr/local/lib/python3.10/dist-packages (from validmind) (5.5.0)
Requirement already satisfied: catboost<2.0,>=1.2 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.2)
Requirement already satisfied: click<9.0.0,>=8.0.4 in /usr/local/lib/python3.10/dist-packages (from validmind) (8.1.3)
Requirement already satisfied: dython<0.8.0,>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.7.2)
Requirement already satisfied: ipython==7.34.0 in /usr/local/lib/python3.10/dist-packages (from validmind) (7.34.0)
Requirement already satisfied: markdown<4.0.0,>=3.4.3 in /usr/local/lib/python3.10/dist-packages (from validmind) (3.4.3)
Requirement already satisfied: myst-parser<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.0.0)
Requirement already satisfied: numpy==1.22.3 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.22.3)
Requirement already satisfied: pandas==1.5.3 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.5.3)
Requirement already satisfied: pandas-profiling<4.0.0,>=3.6.6 in /usr/local/lib/python3.10/dist-packages (from validmind) (3.6.6)
Requirement already satisfied: pydantic<2.0.0,>=1.9.1 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.10.7)
Requirement already satisfied: pypmml<0.10.0,>=0.9.17 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.9.17)
Requirement already satisfied: python-dotenv<0.21.0,>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.20.0)
Requirement already satisfied: requests<3.0.0,>=2.27.1 in /usr/local/lib/python3.10/dist-packages (from validmind) (2.27.1)
Requirement already satisfied: scikit-learn<2.0.0,>=1.0.2 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.2.2)
Requirement already satisfied: seaborn<0.12.0,>=0.11.2 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.11.2)
Requirement already satisfied: shap<0.42.0,>=0.41.0 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.41.0)
Requirement already satisfied: sphinx<7.0.0,>=6.1.3 in /usr/local/lib/python3.10/dist-packages (from validmind) (6.2.1)
Requirement already satisfied: sphinx-markdown-builder<0.6.0,>=0.5.5 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.5.5)
Requirement already satisfied: sphinx-rtd-theme<2.0.0,>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.2.0)
Requirement already satisfied: statsmodels<0.14.0,>=0.13.5 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.13.5)
Requirement already satisfied: tabulate<0.9.0,>=0.8.9 in /usr/local/lib/python3.10/dist-packages (from validmind) (0.8.10)
Requirement already satisfied: tqdm<5.0.0,>=4.64.0 in /usr/local/lib/python3.10/dist-packages (from validmind) (4.64.1)
Requirement already satisfied: xgboost<2.0.0,>=1.5.2 in /usr/local/lib/python3.10/dist-packages (from validmind) (1.7.5)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (67.7.2)
Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (0.18.2)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (0.7.5)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (5.7.1)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (3.0.38)
Requirement already satisfied: pygments in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (2.14.0)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (0.2.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (0.1.6)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython==7.34.0->validmind) (4.8.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->validmind) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->validmind) (2022.7.1)
Requirement already satisfied: scipy>=1.3 in /usr/local/lib/python3.10/dist-packages (from arch<6.0.0,>=5.4.0->validmind) (1.9.3)
Requirement already satisfied: property-cached>=1.6.4 in /usr/local/lib/python3.10/dist-packages (from arch<6.0.0,>=5.4.0->validmind) (1.6.4)
Requirement already satisfied: graphviz in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->validmind) (0.20.1)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->validmind) (3.6.3)
Requirement already satisfied: plotly in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->validmind) (5.13.1)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->validmind) (1.16.0)
Requirement already satisfied: scikit-plot>=0.3.7 in /usr/local/lib/python3.10/dist-packages (from dython<0.8.0,>=0.7.1->validmind) (0.3.7)
Requirement already satisfied: psutil>=5.9.1 in /usr/local/lib/python3.10/dist-packages (from dython<0.8.0,>=0.7.1->validmind) (5.9.5)
Requirement already satisfied: docutils<0.20,>=0.15 in /usr/local/lib/python3.10/dist-packages (from myst-parser<2.0.0,>=1.0.0->validmind) (0.18.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from myst-parser<2.0.0,>=1.0.0->validmind) (3.1.2)
Requirement already satisfied: markdown-it-py<3.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from myst-parser<2.0.0,>=1.0.0->validmind) (2.2.0)
Requirement already satisfied: mdit-py-plugins~=0.3.4 in /usr/local/lib/python3.10/dist-packages (from myst-parser<2.0.0,>=1.0.0->validmind) (0.3.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from myst-parser<2.0.0,>=1.0.0->validmind) (6.0)
Requirement already satisfied: ydata-profiling in /usr/local/lib/python3.10/dist-packages (from pandas-profiling<4.0.0,>=3.6.6->validmind) (4.1.2)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2.0.0,>=1.9.1->validmind) (4.5.0)
Requirement already satisfied: py4j>=0.10.7 in /usr/local/lib/python3.10/dist-packages (from pypmml<0.10.0,>=0.9.17->validmind) (0.10.9.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.27.1->validmind) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.27.1->validmind) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.27.1->validmind) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.27.1->validmind) (3.4)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn<2.0.0,>=1.0.2->validmind) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn<2.0.0,>=1.0.2->validmind) (3.1.0)
Requirement already satisfied: packaging>20.9 in /usr/local/lib/python3.10/dist-packages (from shap<0.42.0,>=0.41.0->validmind) (23.1)
Requirement already satisfied: slicer==0.0.7 in /usr/local/lib/python3.10/dist-packages (from shap<0.42.0,>=0.41.0->validmind) (0.0.7)
Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from shap<0.42.0,>=0.41.0->validmind) (0.56.4)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from shap<0.42.0,>=0.41.0->validmind) (2.2.1)
Requirement already satisfied: sphinxcontrib-applehelp in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (1.0.4)
Requirement already satisfied: sphinxcontrib-devhelp in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (1.0.2)
Requirement already satisfied: sphinxcontrib-jsmath in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (1.0.1)
Requirement already satisfied: sphinxcontrib-htmlhelp>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (2.0.1)
Requirement already satisfied: sphinxcontrib-serializinghtml>=1.1.5 in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (1.1.5)
Requirement already satisfied: sphinxcontrib-qthelp in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (1.0.3)
Requirement already satisfied: snowballstemmer>=2.0 in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (2.2.0)
Requirement already satisfied: babel>=2.9 in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (2.12.1)
Requirement already satisfied: alabaster<0.8,>=0.7 in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (0.7.13)
Requirement already satisfied: imagesize>=1.3 in /usr/local/lib/python3.10/dist-packages (from sphinx<7.0.0,>=6.1.3->validmind) (1.4.1)
Requirement already satisfied: html2text in /usr/local/lib/python3.10/dist-packages (from sphinx-markdown-builder<0.6.0,>=0.5.5->validmind) (2020.1.16)
Requirement already satisfied: pydash in /usr/local/lib/python3.10/dist-packages (from sphinx-markdown-builder<0.6.0,>=0.5.5->validmind) (7.0.3)
Requirement already satisfied: unify in /usr/local/lib/python3.10/dist-packages (from sphinx-markdown-builder<0.6.0,>=0.5.5->validmind) (0.5)
Requirement already satisfied: yapf in /usr/local/lib/python3.10/dist-packages (from sphinx-markdown-builder<0.6.0,>=0.5.5->validmind) (0.33.0)
Requirement already satisfied: sphinxcontrib-jquery!=3.0.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from sphinx-rtd-theme<2.0.0,>=1.2.0->validmind) (4.1)
Requirement already satisfied: patsy>=0.5.2 in /usr/local/lib/python3.10/dist-packages (from statsmodels<0.14.0,>=0.13.5->validmind) (0.5.3)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython==7.34.0->validmind) (0.8.3)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->myst-parser<2.0.0,>=1.0.0->validmind) (2.1.2)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py<3.0.0,>=1.0.0->myst-parser<2.0.0,>=1.0.0->validmind) (0.1.2)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->validmind) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->validmind) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->validmind) (4.39.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->validmind) (1.4.4)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->validmind) (8.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->validmind) (3.0.9)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython==7.34.0->validmind) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython==7.34.0->validmind) (0.2.6)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->shap<0.42.0,>=0.41.0->validmind) (0.39.1)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly->catboost<2.0,>=1.2->validmind) (8.2.2)
Requirement already satisfied: untokenize in /usr/local/lib/python3.10/dist-packages (from unify->sphinx-markdown-builder<0.6.0,>=0.5.5->validmind) (0.1.1)
Requirement already satisfied: tomli>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from yapf->sphinx-markdown-builder<0.6.0,>=0.5.5->validmind) (2.0.1)
Requirement already satisfied: visions[type_image_path]==0.7.5 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (0.7.5)
Requirement already satisfied: htmlmin==0.1.12 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (0.1.12)
Requirement already satisfied: phik<0.13,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (0.12.3)
Requirement already satisfied: multimethod<1.10,>=1.4 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (1.9.1)
Requirement already satisfied: typeguard<2.14,>=2.13.2 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (2.13.3)
Requirement already satisfied: imagehash==4.3.1 in /usr/local/lib/python3.10/dist-packages (from ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (4.3.1)
Requirement already satisfied: PyWavelets in /usr/local/lib/python3.10/dist-packages (from imagehash==4.3.1->ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (1.4.1)
Requirement already satisfied: attrs>=19.3.0 in /usr/local/lib/python3.10/dist-packages (from visions[type_image_path]==0.7.5->ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (23.1.0)
Requirement already satisfied: networkx>=2.4 in /usr/local/lib/python3.10/dist-packages (from visions[type_image_path]==0.7.5->ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (3.1)
Requirement already satisfied: tangled-up-in-unicode>=0.0.4 in /usr/local/lib/python3.10/dist-packages (from visions[type_image_path]==0.7.5->ydata-profiling->pandas-profiling<4.0.0,>=3.6.6->validmind) (0.2.0)

Note: Colab may generate the following warning after running the first cell:

WARNING [...]
You must restart the runtime in order to use newly installed versions

If you see this, please click on “Restart runtime” and continue with the next cell.

##Initializing the Python environment

import pandas as pd
import xgboost as xgb

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

%matplotlib inline

Initializing the ValidMind Client Library

Log in to the ValidMind platform with your registered email address, and navigate to the Documentation Projects page.

Creating a new Documentation Project

(Note: if a documentation project has already been created, you can skip this section and head directly “Finding Project API key and secret”)

Clicking on “Create a new project” allows to you to register a new documentation project for our demo model.

Select “Customer Churn model” from the Model drop-down, and “Initial Validation” as Type. Finally, click on “Create Project”.

Finding the project API key and secret

In the “Client Integration” page of the newly created project, you will find the initialization code that allows the client library to associate documentation and tests with the appropriate project. The initialization code configures the following arguments:

  • api_host: Location of the ValidMind API.
  • api_key: Account API key.
  • api_secret: Account Secret key.
  • project: The project identifier. The project argument is mandatory since it allows the library to associate all data collected with a specific account project.

The code snippet can be copied and pasted directly in the cell below to initialize the ValidMind Developer Framework when run:

## Replace the code below with the code snippet from your project ## 




import validmind as vm

vm.init(
  api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  api_key = "b21d0d2e5bdaaaa550a7996e89a2354e",
  api_secret = "3b56efe570096f7d3dc60d42c66a56fa92da94ad0e280a58cb6e8ccca035c664",
  project = "clhowg73e001s1pk10uouvsde"
)
  
  
  
Connected to ValidMind. Project: [Quickstart] Customer Churn Model - Initial Validation (clhowg73e001s1pk10uouvsde)

Load the Demo Dataset

For the purpose of this demonstration, we will use a sample dataset provided by the ValidMind library.

# Import the sample dataset from the library
from validmind.datasets.classification import customer_churn as demo_dataset
# You can try a different dataset with: 
#from validmind.datasets.classification import taiwan_credit as demo_dataset

df = demo_dataset.load_data()

Initialize a dataset object for ValidMind

Before running the test plan, we must first initialize a ValidMind dataset object using the init_dataset function from the vm module. This function takes in arguements: dataset which is the dataset that we want to analyze; target_column which is used to identify the target variable; class_labels which is used to identify the labels used for classification model training.

vm_dataset = vm.init_dataset(
    dataset=df,
    target_column=demo_dataset.target_column,
    class_labels=demo_dataset.class_labels
)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...

Run the Full Data and Model Validation Test Suite

We will need to preprocess the dataset and produce the training, test and validation splits first.

Prepocess the Raw Dataset

For demonstration purposes, we simplified the preprocessing using demo_dataset.preprocess which executes the following operations:

train_df, validation_df, test_df = demo_dataset.preprocess(df)

x_train = train_df.drop(demo_dataset.target_column, axis=1)
y_train = train_df[demo_dataset.target_column]
x_val = validation_df.drop(demo_dataset.target_column, axis=1)
y_val = validation_df[demo_dataset.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, early_stopping_rounds=10,
              enable_categorical=False, eval_metric=['error', 'logloss', 'auc'],
              feature_types=None, gamma=None, gpu_id=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, n_estimators=100, n_jobs=None,
              num_parallel_tree=None, predictor=None, random_state=None, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

We can now initialize the training and test datasets into dataset objects using vm.init_dataset():

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    type="generic",
    target_column=demo_dataset.target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    type="generic",
    target_column=demo_dataset.target_column
)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...

We also initialize a model object using vm.init_model():


vm_model = vm.init_model(
    model,
    train_ds=vm_train_ds,
    test_ds=vm_test_ds,
)

Run the Full Suite

We are now ready to run the test suite for binary classifier with tabular datasets. This function will run test plans on the dataset and model objects, and will document the results in the ValidMind UI.

full_suite = vm.run_test_suite(
    "binary_classifier_full_suite",
    dataset=vm_dataset,
    model=vm_model
)

You can access and review the resulting documentation in the ValidMind UI, in the “Model Development” section of the model documentation.